Goto

Collaborating Authors

 oracle inequality


Parameter-Free Online Learning via Model Selection

Neural Information Processing Systems

We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning. Departing from previous work, which has focused on highly structured function classes such as nested balls in Hilbert space, we propose a generic meta-algorithm framework that achieves online model selection oracle inequalities under minimal structural assumptions. We give the first computationally efficient parameter-free algorithms that work in arbitrary Banach spaces under mild smoothness assumptions; previous results applied only to Hilbert spaces. We further derive new oracle inequalities for matrix classes, non-nested convex sets, and $\mathbb{R}^{d}$ with generic regularizers. Finally, we generalize these results by providing oracle inequalities for arbitrary non-linear classes in the online supervised learning model. These results are all derived through a unified meta-algorithm scheme using a novel multi-scale algorithm for prediction with expert advice based on random playout, which may be of independent interest.


Self-Regularized Learning Methods

Schölpple, Max, Fanghui, Liu, Steinwart, Ingo

arXiv.org Machine Learning

We introduce a general framework for analyzing learning algorithms based on the notion of self-regularization, which captures implicit complexity control without requiring explicit regularization. This is motivated by previous observations that many algorithms, such as gradient-descent based learning, exhibit implicit regularization. In a nutshell, for a self-regularized algorithm the complexity of the predictor is inherently controlled by that of the simplest comparator achieving the same empirical risk. This framework is sufficiently rich to cover both classical regularized empirical risk minimization and gradient descent. Building on self-regularization, we provide a thorough statistical analysis of such algorithms including minmax-optimal rates, where it suffices to show that the algorithm is self-regularized -- all further requirements stem from the learning problem itself. Finally, we discuss the problem of data-dependent hyperparameter selection, providing a general result which yields minmax-optimal rates up to a double logarithmic factor and covers data-driven early stopping for RKHS-based gradient descent.


Variational Approximations for Robust Bayesian Inference via Rho-Posteriors

Khribch, EL Mahdi, Alquier, Pierre

arXiv.org Machine Learning

The $ρ$-posterior framework provides universal Bayesian estimation with explicit contamination rates and optimal convergence guarantees, but has remained computationally difficult due to an optimization over reference distributions that precludes intractable posterior computation. We develop a PAC-Bayesian framework that recovers these theoretical guarantees through temperature-dependent Gibbs posteriors, deriving finite-sample oracle inequalities with explicit rates and introducing tractable variational approximations that inherit the robustness properties of exact $ρ$-posteriors. Numerical experiments demonstrate that this approach achieves theoretical contamination rates while remaining computationally feasible, providing the first practical implementation of $ρ$-posterior inference with rigorous finite-sample guarantees.


Covariance-Driven Regression Trees: Reducing Overfitting in CART

Zhang, Likun, Ma, Wei

arXiv.org Machine Learning

Decision trees are powerful machine learning algorithms, widely used in fields such as economics and medicine for their simplicity and interpretability. However, decision trees such as CART are prone to overfitting, especially when grown deep or the sample size is small. Conventional methods to reduce overfitting include pre-pruning and post-pruning, which constrain the growth of uninformative branches. In this paper, we propose a complementary approach by introducing a covariance-driven splitting criterion for regression trees (CovRT). This method is more robust to overfitting than the empirical risk minimization criterion used in CART, as it produces more balanced and stable splits and more effectively identifies covariates with true signals. We establish an oracle inequality of CovRT and prove that its predictive accuracy is comparable to that of CART in high-dimensional settings. We find that CovRT achieves superior prediction accuracy compared to CART in both simulations and real-world tasks.


Tessellation Localized Transfer learning for nonparametric regression

Halconruy, Hélène, Bobbia, Benjamin, Lejamtel, Paul

arXiv.org Machine Learning

Transfer learning aims to improve performance on a target task by leveraging information from related source tasks. We propose a nonparametric regression transfer learning framework that explicitly models heterogeneity in the source-target relationship. Our approach relies on a local transfer assumption: the covariate space is partitioned into finitely many cells such that, within each cell, the target regression function can be expressed as a low-complexity transformation of the source regression function. This localized structure enables effective transfer where similarity is present while limiting negative transfer elsewhere. We introduce estimators that jointly learn the local transfer functions and the target regression, together with fully data-driven procedures that adapt to unknown partition structure and transfer strength. We establish sharp minimax rates for target regression estimation, showing that local transfer can mitigate the curse of dimensionality by exploiting reduced functional complexity. Our theoretical guarantees take the form of oracle inequalities that decompose excess risk into estimation and approximation terms, ensuring robustness to model misspecification. Numerical experiments illustrate the benefits of the proposed approach.


Oracle Inequalities for Model Selection in Offline Reinforcement Learning

Neural Information Processing Systems

In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good policy without interacting with the environment. A major challenge in applying such methods in practice is the lack of both theoretically principled and practical tools for model selection and evaluation. To address this, we study the problem of model selection in offline RL with value function approximation. The learner is given a nested sequence of model classes to minimize squared Bellman error and must select among these to achieve a balance between approximation and estimation error of the classes. We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal oracle inequalities up to logarithmic factors. The algorithm, ModBE, takes as input a collection of candidate model classes and a generic base offline RL algorithm. By successively eliminating model classes using a novel one-sided generalization test, ModBE returns a policy with regret scaling with the complexity of the minimally complete model class. In addition to its theoretical guarantees, it is conceptually simple and computationally efficient, amounting to solving a series of square loss regression problems and then comparing relative square loss between classes. We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.


Parameter-Free Online Learning via Model Selection

Neural Information Processing Systems

We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning. Departing from previous work, which has focused on highly structured function classes such as nested balls in Hilbert space, we propose a generic meta-algorithm framework that achieves online model selection oracle inequalities under minimal structural assumptions. We give the first computationally efficient parameter-free algorithms that work in arbitrary Banach spaces under mild smoothness assumptions; previous results applied only to Hilbert spaces. We further derive new oracle inequalities for matrix classes, non-nested convex sets, and $\mathbb{R}^{d}$ with generic regularizers. Finally, we generalize these results by providing oracle inequalities for arbitrary non-linear classes in the online supervised learning model. These results are all derived through a unified meta-algorithm scheme using a novel multi-scale algorithm for prediction with expert advice based on random playout, which may be of independent interest.


Parameter-Free Online Learning via Model Selection

Dylan J. Foster, Satyen Kale, Mehryar Mohri, Karthik Sridharan

Neural Information Processing Systems

Finally, we generalize these results by providing oracle inequalities for arbitrary non-linear classes in the online supervised learning model. These results are all derived through a unified meta-algorithm scheme using a novel "multi-scale" algorithm for prediction with expert advice based on random playout, which may be of


Data driven estimation of Laplace-Beltrami operator

Frederic Chazal, Ilaria Giulini, Bertrand Michel

Neural Information Processing Systems

Approximations of Laplace-Beltrami operators on manifolds through graph Lapla-cians have become popular tools in data analysis and machine learning. These discretized operators usually depend on bandwidth parameters whose tuning remains a theoretical and practical problem.


Structure-Blind Signal Recovery

Dmitry Ostrovsky, Zaid Harchaoui, Anatoli Juditsky, Arkadi S. Nemirovski

Neural Information Processing Systems

We consider the problem of recovering a signal observed in Gaussian noise. If the set of signals is convex and compact, and can be specified beforehand, one can use classical linear estimators that achieve a risk within a constant factor of the minimax risk. However, when the set is unspecified, designing an estimator that is blind to the hidden structure of the signal remains a challenging problem. We propose a new family of estimators to recover signals observed in Gaussian noise. Instead of specifying the set where the signal lives, we assume the existence of a well-performing linear estimator. Proposed estimators enjoy exact oracle inequalities and can be efficiently computed through convex optimization.